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Many  statistical  situations  in  which  it  is  customary  to 
employ  hypothesis  testinq,  really  involve  a  choice  between 
more  than  two  decisions.  In  such  problems,  when  the  hypothesis 
is  rejected,  one  wants  to  know  in  which  of  a  number  of  possible 
ways  the  actual  si.  ..lion  (true  state  of  nature)  differs  from 
the  one  postulated  by  the  null  hypothesis.  By  formulating  the 
problem  as  one  involving  only  two  decisions,  we  not  only  neglect 
to  differentiate  between  certain  alternative  decisions,  which  may 
differ  considerably  in  their  consequences,  but  we  may  also  be 
using  an  inappropriate  acceptance  region  for  the  hypothesis. 

The  traditional  approach  to  hypotheses  testing  problems  is 
inadequate  and  unrealistic  in  the  sense  that  it  is  not  formulated 
in  a  way  to  answer  the  experimenter's  question,  namely,  how  to 
identify  the  best  population?  If  fact,  the  method  of  least 
significant  differences  based  on  the  t-test  has  been  used  in  the 
past  to  detect  differences  between  the  true  unknown  means  of 
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different  varieties  and  thereby  choosing  the  population  which  is 
"best"  i.e.  the  one  with  the  largest  mean.  Hut  this  method  is 
indirect,  less  efficient  and  does  not  provide  an  overall  probability 
of  a  correct  selection.  If  the  null  hypothesis  is  not  rejected, 
then  there  is  no  decision  made  in  the  traditional  approach. 
Furthermore,  when  performing  a  test  one  may  commit  one  of  two 
errors:  rejecting  the  hypothesis  when  it  is  true  (error  of  the 

first  kind)  or  accepting  it  when  it  is  false  (error  of  the  second 
kind).  It  is  desirable  to  carry  out  the  test  in  a  manner  which 
keeps  the  probabilities  of  the  two  types  of  error  to  a  minimum. 

It  is  customary  to  assign  a  bound  to  the  probability  of  incorrectly 
rejecting  the  hypothesis  when  it  is  true,  and  to  attempt  to  minimize 
the  other  probability  subject  to  this  condition.  Unfortunately, 
when  the  number  of  observations  is  given,  both  probabilities 
cannot  be  controlled  simultaneously  by  the  classical  approach 
(see  Lehmann  (1959)).  The  decision-theoretic  approach  provides  an 
effective  tool  to  overcome  these  difficulties  in  some  reasonable 
ways.  Actually,  the  cases  described  above  can  be  formulated  as 
general  multiple  decision  problems.  To  tnis  end,  we  start  by 
defining  the  space  ti  of  actions  of  the  statistician  consisting 
of  a  finite  number  (k  >_  2)  of  points,  U  =  f  a ^  , .  . .  , a^ ) .  There 
are  two  distinct  types  of  multiple  decision  problems  that  seem  to 
arise  in  practice.  In  one  the  parameter  space  *»  is  partitioned 
into  k  subsets  b  , . . . , ^ ,  according  to  the  increasing  value  of 
a  single  real-valued  function  r ( 0 ) .  The  action  a.  is  preferred 
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if  '  i  y ( .  This  typo  of  multiple  decision  problem  is  called 
monotone.  Tins  approach  has  been  studied  by  Karlin  and  Rubin 
(1956)  and  Brown,  Cohen  and  Strawderman  (1976).  For  example, 
if  an  experimenter  is  comparing  two  treatments  with  means  0^ 
and  "21  ho  miqht  have  available  to  him  only  a  finite  number  of 
actions  among  which  ho  has  preference  based  on  the  magnitude  of 
the  differences  of  the  means  ^  particular  case  occurs 

when  he  may  choose  from  the  three  alternatives: 

(i)  prefer  treatment  1  over  treatment  2, 

(Li)  prefer  treatment  2  over  treatment  1, 

(iii)  no  preference  (cf.  Ferguson  1967). 

Another  important  class  of  multiple  decision  problems  for 
selection  is  where  the  treatments  are  classified  into  a  superior 
cateqorv  (the  selected  items)  and  an  inferior  one.  In  general, 
selection  problems  have  been  treated  under  several  different 
formulations.  A  basic  distinction  corresponds  to  Model  I  and 
II  cases  in  the  analysis  of  variance.  In  Model  I,  the  treatments 
being  classified  are  considered  fixed;  only  the  observations 
made  on  each  of  them  are  random.  In  Model  II,  on  the  other 
hand,  the  treatments  themselves  are  drawn  at  random  from  some 
population  and  would,  therefore,  change  under  replications  of  the 
exper iment . 

Lehmann  (1961)  (and  earlier  authors,  see  Paulson  (1949), 
Bechhofer  (1954),  and  Gupta  (1956))  have  formulated  the  selection 
problems  according  to  the  following  goals: 
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Goa 1  1 .  We  wish  to  select  only  a  single  population  (if  possible 
the  best  one) s  the  variety  to  be  planted,  the  production  method 
we  are  going  to  adopt,  etc.  As  a  slight  generalization,  we  may 
wish  to  select  a  fixed  number,  say  two  or  three. 

Goal  2.  The  number  of  populations  to  be  selected  (or  the  subset 
size)  is  not  fixed  in  advance  but  is  determined  by  the  observations. 
This  arises,  for  example,  when  we  wish  to  select  all  worthwhile 
treatments,  or  if  we  want  to  be  reasonably  sure  that  the 
selected  group  contains  the  best  treatment. 

Goal  3 .  The  subset  size  is  determined  by  the  observed  data  subject 
to  an  upper  bound  specified  in  advance.  It  may,  for  example,  be 
desirable  to  investigate  all  treatments  that  appear  promising  but 
budget  restrictions  may  limit  the  research  program  to  the  investiga¬ 
tion  of  at  most  three  treatments. 

Among  the  early  investigators  of  multiple  decision  procedures 
are  Paulson  (1949),  Bahadur  (1950),  Bahadur  and  Robbins  (1950). 

The  formulation  of  such  procedures  in  the  framework  oi  selection 
and  ranking  procedures  has  been  generally  accomplished  either 
using  the  indifference  zone  approach  or  the  (random-size)  subset 
selection  approach.  The  former  approach  was  introduced  by 
Bechhofer  (1954).  Substantial  contributions  to  the  early  and 
subsequent  developments  in  the  subset  selection  theory  have  been 
made  by  Gupta  starting  from  his  work  in  1956. 


j. 
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Bechhofer  (1954)  considered  the  problem  of  ranking  k 

normal  means.  In  order  to  explain  the  basic  formulation, 

consider  the  problem  of  selecting  the  population  with  the 

largest  mean  from  k  normal  populations  with  unknown  means 

2 

ui»  i  =  l,2,...,k,  and  a  common  known  variance  o  .  Let  X^, 
i  =  l,...,k,  denote  the  means  of  independent  samples  of  size 
n  from  these  populations.  The  "natural"  procedure  (which  has 
been  shown  to  have  optimum  properties  by  Hall  (1959)  and  Eaton 
(1967))  is  to  select  the  population  that  yields  the  largest  X^'s. 
The  experimenter  would,  of  course,  need  a  guarantee  that  this 
procedure  will  pick  the  population  having  the  largest  p^  with  a 
probability  not  less  than  a  specified  level  P*.  For  the  problem 
to  be  meaningful  P*  lies  between  ^  and  1.  Since  we  do  not  know 
the  true  configuration  of  the  p^,  we  look  for  the  least  favorable 
configuration  (LFC)  for  which  the  probability  of  a  correct 
selection  (PCS)  is  minimized.  This  LFC  is  given  by  p^  =...=  p^; 
the  corresponding  PCS  =  1/k  and  hence  the  probability  guarantee 
cannot  be  met  whatever  be  the  sample  size  n.  A  natural  modifica¬ 
tion  is  to  insist  on  the  minimum  probability  guarantee  whenever 
the  best  population  is  sufficiently  superior  to  the  next  best. 

In  other  words,  the  experimenter  specifies  a  positive  constant 


A*  and  requires  that  the  PCS  is  at  least  P*  whenever  p  [k]  "M  [fc-i  j  L  • 

where  P[jj  £•  •  .<_  pjkj  denote  the  ordered  means.  Now  the  minimiza¬ 
tion  of  PCS  is  over  the  part  of  the  parameter  space  in  which 

11  [k)  ~  M[k-1]  —  ’  T^e  comPlement  called  the 
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indifference  zone  for  obvious  reason.  The  LFC  in  $3^*  is  given 
by  ^(11  3***=  u(k-l)  =  l‘|k|  ~  A*.  The  problem  is  then  to  determine 
the  minimum  sample  size  required  in  order  to  have  PCS  '  P*  for 


all  u  such  that  vi 


u  ( k - 1 ]  -  A*‘ 


Bechhofer's  formulation  is  more'  general  than  what  is 


described  above.  His  general  ranking  problem  includes,  for 

example,  selection  of  the  t  best  populations. 

In  the  subset  selection  approach,  the  goal  is  to  select 

a  nonempty  subset  of  the  populations  so  as  to  include  the  best 

population.  Here  the  size  of  the  selected  subset  is  random  and 

is  determined  by  the  observations  themselves.  In  the  case  of 

normal  populations  with  unknown  means  and  a  common 

2 

known  variance  ci  ,  the  rule  proposed  by  Gupta  (1956)  selects  the 

population  that  yields  X.  iff  X.  s  max  X.  -  —  ,  where 

1  113lk  1  /n 

d  =  d(k,P*)  >  0  is  determined  so  that  the  PCS  is  at  least  r*. 

Here  a  correct  selection  is  the  selection  of  any  subset  that 
includes  the  population  with  the  largest  )  Thus,  the  LFC  is 
with  regard  to  the  whole  parameter  space  i .  Under  this  formulation, 
for  given  k  and  P*  we  determine  the  constant  d.  The  rule 
explicitly  involves  n.  In  general,  the  rule  will  involve 
a  constant  which  depends  on  k,  P*  and  n.  The  performance  of  a 
subset  selection  procedure  is  studied  by  evaluating  the  expected 
subset  size  and  its  supremum  over  the  parameter  space  Si. 

To  study  useful  good  procedures,  we  look  for  some  optimal 
rules  in  a  small  class  of  decision  rules.  Using  the  general 


decision  theory  of  Wald  (1950),  we  can  construct  optimal  selection 


procedures  accord ing  to  several  different  approaches.  The 
indifference  zone  and  subset  selection  approaches  can  be  combined 
to  guarantee  the  infimum  of  the  probability  of  a  correct  selection 
over  a  preference  zone  and  minimize  the  supremum  of  the  expected 
size  of  selected  subset  over  the  parameter  space  (cf.  Gupta  and 
Huanq  (1976)).  In  many  cases,  we  do  not  know  whether  the  true 
parametric  conf iqurat ion  belonqs  to  the  preference  zones.  If  the 
best  and  the  second  best  are  not  very  much  different  it  is 
reasonable  to  select  a  subset.  We  would  like  to  keep  the  subset 
size  under  control.  This  can  be  achieved  by  increasinq  the 
sample  size  which  also  increases  the  probability  of  a  correct 
selection.  Sometimes,  we  have  some  partial  information  about  the 
parameter  space;  in  this  case,  the  so-called  r -minimax  selection 
rules  have  been  studied  by  Gupta  and  Huanq  (1977). 

In  qeneral,  we  are  interested  in  introducing  a  wide  class  of 
decision  rules,  the  so-called  subset  selection  procedures  for  the 
k  populations  u ^ , . . . , n ^ .  We  are  qiven  one  observat ion  x.  from 
each  population  i ^ ,  i  =  l,2,...,k.  The  vector  of  observations 
is  x  -  (Xj,...,xk).  Each  ti  ^  is  characterized  by  a  parameter  P^. 
bet  i  denote  the  action  space  consisting  of  the  2  subsets  of  the 
set  (l,2,...,k).  A  measurable  function  d  defined  on  X  *  u  is 
called  a  selection  oroeedure  provided  that  for  each  fixed  k-vector 
of  observations  x  t  %,  d(x,a)  0  and  ))^d(x,a)  =  1.  Thus  if  X  =  x 
is  an  observed  vector  of  observations,  d(x,a)  is  the  probability 
of  selecting  the  subset  a  t  G.  It  can  be  seen  that 


H 


S.(x)  =  l  d(x,a)  (summation  over  those  subsets  a  containing  i) 

1  aui 

denotes  the  probability  of  selecting  the  ith  population.  The 

functions  <5^,  5^  will  be  referred  to  as  the  individual 

selection  probabilities.  Note  that  the  selection  procedure  d 

is  completely  specified  by  the  individual  selection  probabilities 

whenever  the  latter  take  on  only  the  values  zero  and  one. 

In  studying  the  performance  of  subset  selection  procedures, 

the  measures  of  loss  most  often  used  have  been  related  to  incorrect 

selection  ICS(0,a)  where  0  =  (0j,...,0^),  and  the  number  of 

elements  |aj  in  the  selected  subset  a.  More  recently,  Goel  and 

Rubin  (1977)  studied  the  subset  selection  problem  from  a  Bayesian 

point  of  view  using  loss  functions  that  are  linear  combinations  of 

9  „  .  -  max  0.  and  lal.  Bickel  and  Yahav  (1977)  studied  the  behavior 
(kl  j  6a  3 

of  Bayes  procedures  as  k  *  “>  using  loss  functions  that  are  linear 

combinations  of  ICS  (9, a)  and  0,.  ,  -  ]  0./|a|.  Chernoff  and 

1  jta 

Yahav  (1977),  employing  Monte  Carlo  techniques,  compared  the 

integrated  risks  with  respect  to  exchanciocble  normal  priors  of 

Bayes  procedures, and  Bechhofer  type  procedures  usinn  loss  functions 

that  are  linear  combinations  of  0,.,  -  nu.x  9.  and  0  T  0./|a|. 

m  jfca  :  lKJ  j'ta  1 

Gupta  and  Hsu  (1978)  used  Monte  Carlo  study  which  parallels  that 
of  Chernoff  and  Yahav  in  that  exchangeable  normal  priors  are  used 
but  differs  in  that  the  loss  functions  considered  are  linear 
combinations  of  ICS(0,a)  and  |a|.  Several  classical  methods  have 
been  compared  by  Gupta  and  Hsu  (1978).  The  four  loss  combinations 
that  have  been  used  are  presented  in  Figure  1. 
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ICS (0 , a)  - 


(1) 


(3)  (2) ^ 

V 

>< 


9  , ,, -max  9 . 
IRI  jU  1 


(4) 


Figure  1 


Note  that  the  different  combinations  have  different  interpretations. 
The  combinations  (1)  and  (2)  correspond  to  situations  where  the 
subset  selection  procedure  is  used  as  a  screening  procedure. 

For  example,  in  developing  a  new  drug,  a  pharmaceutical  company  may 
start  with  a  number  of  ingredients  known  to  have  beneficial  effects 
(and  side  effects)  from  previous  experiments,  and  then  obtain  a 
collection  of  potentially  good  formulae  for  combining  these 
ingredients  in  different  proportions.  After  the  first  stage  of 
testing,  one  wants  to  reject  those  formulae  that  are  evidently 
non-best  and  retain  those  that  still  seem  potentially  best  for 
further  study.  Eventually,  if  the  development  is  successful, 
only  one  formula  will  be  marked.  Corresponding  to  this  situation 
then,  loss  functions  that  depend  only  on  the  best  selected  and 
the  size  selected  are  reasonable.  Cn  the  other  hand,  the  compo¬ 
nent  ...  -  J  © . / | a |  in  the  combinations  (3)  and  (4)  correspond 
j  ta  ^ 

to  situation  where  all  those  selected  will  be  used.  This  is  the 
case,  for  example,  when  one  purchases  stocks  for  long  term 
investment.  One  would  purchase  stocks  of  more  than  one  company 


10 


to  guard  against  the  possibility  of  gross  errors,  and  all  the  stocks 
purchased  contribute  to  the  gain  or  loss. 

Investigation  of  these  problems  more  in  the  framework  of 
general  multiple  decision  theory  is  a  topic  of  current  research 
interest.  The  field  of  selection  and  ranking  problems  has  nrown 
steadily  over  the  years  as  also  evidenced  by  the  publication  of 
the  following  books:  Bechhofer,  Kiefer  and  Sobel  (1968), 

Gibbons,  Olkin  and  Sobel  (1977),  Gupta  and  I’unt'hapakcsan  (1979) 
and  Gupta  and  Huang  (1979). 

A  related  problem  (see  Goal  3)  in  which  the  number  of 
populations  selected  does  not  exceed  a  qiven  number  m  ('  k) , 
was  solved  by  Gupta  and  Santner  (  1973)  and  Santner  (l  ; 75).  More 
recently,  Kiefer  (1975,  1976,  1977)  has  made  several  significant 
contributions  in  the  very  interesting  area  of  multiple  decision 
problems  based  on  conditional  inference. 
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